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Abstract 


The  Hough  transform  is  a  well  known  technique  for  detecting  parametric  curves 
in  images.  We  place  a  particular  group  of  Hough  transforms,  the  probabilistic  Hough 
transforms,  in  the  framework  of  importance  sampling.  This  framework  suggests  a  way 
in  which  probabilistic  Hough  transforms  can  be  improved:  by  specifying  a  target  dis¬ 
tribution  and  weighting  the  sampled  parameters  accordingly  to  make  identification  of 
curves  easier.  We  investigate  the  use  of  clustering  techniques  to  simultaneously  identify 
multiple  curves  in  an  image.  We  also  use  probabilistic  arguments  to  develop  stopping 
conditions  for  the  algorithm.  The  resulting  methodology  is  called  the  Importance  Sam¬ 
pling  Hough  Transform  (ISHT).  We  apply  our  method  to  both  simulated  and  real  data, 
and  compare  its  performance  with  that  of  two  much  used  versions  of  the  Hough  trans¬ 
form:  the  standard  Hough  transform  and  the  randomized  Hough  transform.  In  our 
experiments,  it  is  more  accurate  than  either  of  these  common  methods,  and  it  is  faster 
than  the  randomized  Hough  transform. 

KEY  WORDS:  Clustering,  Importance  sampling.  Hough  transform.  Probabilistic  Hough 
transform.  Target  distribution. 
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1  Introduction 


The  Hough  transform  is  a  well  known  technique  for  detecting  parametric  curves  in  images. 
We  place  a  particular  group  of  Hough  transforms,  the  probabilistic  Hough  transforms,  in  the 
framework  of  importance  sampling.  This  framework  suggests  a  way  in  which  probabilistic 
Hough  transforms  can  be  improved:  by  specifying  a  target  distribution  and  weighting  the 
sampled  parameters  accordingly  to  make  identihcation  of  curves  easier.  We  investigate  the 
use  of  clustering  techniques  to  simultaneously  identify  multiple  curves  in  the  image.  We  also 
use  probabilistic  arguments  to  develop  stopping  conditions  for  the  algorithm.  Results  from 
applying  our  method  and  two  popular  versions  of  the  Hough  transform  to  both  simulated 
and  real  data  are  shown. 


1.1  Notation  and  Definitions 

The  Hough  transform  (Hough  1962),  hereafter  HT,  is  typically  used  to  detect  object  bound¬ 
aries  in  images.  Before  applying  the  HT  to  a  particular  image,  the  image  must  be  transformed 
by  an  edge  detection  algorithm  into  a  binary  edge  detected  image  (also  referred  to  as  the 
image  space).  An  edge  detection  algorithm  assigns  each  pixel  to  be  a  foreground  pixel  {edge 
pixel).,  or  a  background  pixel.  In  all  the  examples  presented  here  the  foreground  pixels  are 
black  and  the  background  pixels  are  white.  We  will  use  the  term  points  to  refer  to  foreground 
pixels  and  the  term  pixels  to  refer  to  foreground  and  background  pixels.  We  dehne  N  to  be 
the  total  number  of  points  in  the  image  space. 

In  order  to  detect  instances  of  a  particular  parametric  curve  in  an  image,  one  must  decide 
upon  a  parameterization  of  the  curve.  We  will  denote  the  curve  parameterization  by  0.  The 
dimension  of  the  curve  will  be  denoted  by  p.  For  instance,  if  a  line  is  parameterized  in 
Cartesian  coordinates,  p  =  2,  and  0  =  {m,  c},  where  m  and  c  are  the  slope  and  intercept  of 
the  line,  respectively. 

Several  of  the  HTs  we  discuss  use  an  accumulator  array  to  detect  curves.  An  accumulator 
array,  denoted  by  A,  is  a  discretization  of  the  parameter  space,  0.  Each  cell  of  A  is  used  to 
store  votes  for  curves  whose  parameters  are  contained  within  the  cell. 


1.2  The  Standard  Hough  transform 

The  standard  HT  (SHT)  is  a  popular  method  for  detecting  parametric  curves  (lines,  circles, 
ellipses  etc)  in  binary  image  data.  Good  introductions  to  the  HT  and  surveys  of  the  literature 
can  be  found  in  Illingworth  and  Kittler  (1988)  and  Leavers  (1993).  We  will  briefly  describe 
the  SHT  for  line  detection.  Polar  coordinates  {p  =  xcosB  -|-  ysinO),  are  most  often  used  to 
parameterize  lines  (Duda  and  Hart  1972).  Given  this  parameterization,  the  parameters  of 
all  lines  passing  through  a  particular  point  {x',  y')  in  the  image  space  form  a  sinusoid  in  the 
parameter  space.  The  standard  HT  for  detecting  lines  proceeds  by  mapping  each  point  in 
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the  image  space  to  its  sinusoid  in  the  discretized  parameter  space  (the  accumulator  array, 
A).  Each  cell  in  the  accumulator  array  is  incremented  once  for  each  sinusoid  that  passes 
through  it.  A  peak  detection  method,  sometimes  a  simple  threshold,  is  used  to  locate  local 
peaks  in  the  accumulator  array.  The  location  of  each  peak  gives  the  parameters  of  each 
detected  line. 

1.3  The  Randomized  Hough  transform 

The  main  problems  of  the  standard  HT  are  its  long  computation  times  and  large  storage 
requirements.  The  long  computation  times  are  caused  by  the  fact  that  the  HT  increments 
the  cells  in  the  accumulator  array  corresponding  to  all  curves  that  pass  through  all  points  in 
the  image.  Thus,  much  of  the  computation  time  is  spent  storing  votes  for  curves  with  very 
little  support  from  the  data.  The  storage  requirements  of  the  standard  Hough  transform 
are  a  bigger  problem  though,  since  the  size  of  the  accumulator  array  is  exponential  in  the 
number  of  parameters.  When  the  number  of  parameters  is  greater  than  two,  the  storage 
space  requirements  become  excessive. 

A  new  class  of  HTs  called  probabilistic  Hough  transforms  (PHTs)  attempted  to  address 
these  problems  by  using  random  sampling  of  edge  points,  and  many-to-one  mapping  of  edge 
points  from  the  image  space  to  the  parameter  space.  The  simplest  PHT  is  the  randomized 
HT  (RHT)  due  to  Xu,  Oja,  and  Kultanen  (1990).  Below  we  outline  the  RHT  algorithm  for 
the  general  case  of  detecting  all  instances  of  a  particular  p-dimensional  curve  in  a  binary 
image. 

1.  Create  the  set  E  of  all  edge  points  in  the  binary  edge  detected  image. 

2.  Select  p  points,  (ei, . . . ,  Cp),  at  random  from  E. 

3.  Solve  for  the  parameters  {9)  of  the  curve  in  the  image  space,  dehned  by  the  selected 
points. 

4.  Increment  the  appropriate  cell,  A{9),  in  an  accumulator  array. 

5.  If  the  A{9)  exceeds  a  predehned  threshold  t,  then  the  curve  parameterized  by  9  is 
detected.  When  this  happens,  the  points  that  lie  on  the  curve  are  removed  from  E 
and  the  accumulator  array  is  re-initialized. 

6.  Regardless  of  whether  a  curve  is  detected,  check  whether  or  not  a  stopping  condition 
is  satished.  If  it  is  not,  return  to  step  2. 

The  two  main  ingredients  in  this  algorithm  (and  all  PHTs)  are  a  sampling  mechanism 
(steps  2-3)  and  a  peak  detection  method  (steps  4-5). 
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The  sampling  mechanism  dehnes  a  sampling  distribution  on  the  continuous  parameter 
space,  0.  However,  the  sampling  mechanism  is  discrete  in  nature,  so  we  can  view  the 
sampling  mechanism  as  dehning  a  discrete  sample  space  Q  C  0,  and  a  sampling  distribution 
(probability  mass  function),  g{-)^  on  fl.  In  the  RHT,  the  sampling  mechanism  is  the  selection 
of  p  points  at  random.  Thus,  if  there  are  K  =  (^)  p-tuples  of  points  and  di  is  the  parameter 
for  the  p-tuple,  then  =  {9i  , . . . , 9k}  and  the  sampling  distribution  can  be  written  as: 

1  ^ 
i—1 

If  all  the  0's  are  unique,  then  g{-)  is  a  uniform  distribution  on  fl,  i.e. 

g{9*)  =  ^  9*  en. 

Curves  that  are  present  in  the  image  should  correspond  to  regions  in  0  that  have  relatively 
large  probability  under  g.  The  peak  detection  method  of  the  RHT  achieves  this  by  grouping 
sampled  parameter  values  into  cells  in  the  accumulator  array  and  comparing  with  a  threshold. 

Applying  the  RHT  is  not  straightforward  if  the  curve  is  nonlinear  with  respect  to  its  pa¬ 
rameters,  in  which  case  selecting  p  points  does  not  necessarily  uniquely  dehne  a  p-dimensional 
curve.  An  ellipse  is  an  example  of  a  curve  that  is  nonlinear  with  respect  to  its  parameters. 

A  solution  to  the  problem  of  detecting  ellipses  within  the  framework  of  PHTs  has  been  ex¬ 
plored  by  McLaughlin  (1998)  (based  on  work  by  Yuen,  Illingworth,  and  Kittler  (1989)  in  a 
non-PHT  framework).  In  his  method,  three  points  are  selected  at  a  time.  Estimates  of  the 
tangents  to  the  (possible)  ellipse  at  these  points  are  then  calculated  and  used  to  interpolate 
the  center  of  the  ellipse.  Given  the  location  of  the  ellipse  center  the  remaining  parameters 
are  easily  determined. 

A  greater  problem  affecting  the  RHT  is  that  its  performance  is  poor  when  the  image  is 
noisy  or  complex.  New  PHTs  have  been  proposed  to  improve  the  performance  of  the  RHT. 
The  simplest  modihcation  to  the  RHT  is  the  RHT  with  point  distance  criterion  (RHTJD). 

In  this  HT,  at  each  iteration,  one  point  is  sampled  at  random  and  then  p  —  1  points  are 
sampled  uniformly  from  all  points  that  are  within  certain  a  distance  (greater  than  dmin  and 
less  than  dmax)  of  the  hrst  point.  Sensible  choices  of  dmin  and  dmax  lead  to  an  increase  in 
the  probability  of  sampling  points  on  a  curve  present  in  the  image. 

Comparisons  of  various  probabilistic  (and  non-probabilistic)  HTs  can  be  found  in  Kalviainen, 
Hirvonen,  Xu,  and  Oja  (1995)  and  Kalviainen  and  Hirvonen  (1997).  These  methods  include 
the  RHT_D  mentioned  above,  the  Dynamic  RHT,  the  Random  Window  RHT,  the  Window 
RHT,  the  Connective  RHT,  and  the  Dynamic  Combinatorial  HT.  Most  of  these  modihca- 
tions  to  the  RHT  attempt  to  improve  the  sampling  distribution  in  various  ways  so  that  it  is 
more  peaked  around  the  parameters  corresponding  to  the  curves  in  the  image,  thus  making 
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the  subsequent  process  of  peak  detection  easier.  In  some  cases  the  distinction  between  the 
sampling  mechanism  and  peak  detection  method  is  blurred,  e.g.  in  the  DRHT  where  curves 
are  detected  by  an  iterative  process  involving  two  RHTs. 

While  all  of  these  methods  differ  in  the  sampling  distribution  employed  they  all  share  two 
common  traits:  (i)  they  all  detect  curves  sequentially,  and  (ii)  curves  are  detected  when  the 
number  of  votes  in  a  cell  in  the  accumulator  array  exceeds  a  certain  threshold,  or  in  other 
words,  when  a  curve  has  been  sampled  a  certain  number  of  times.  We  see  these  as  areas 
where  improvements  can  be  made.  When  detecting  curves  sequentially,  every  time  a  curve 
is  detected  the  accumulator  array  is  reset,  thus  discarding  other  curves  that  have  already 
been  accumulated.  When  accumulating  a  curve  in  the  accumulator  array,  typically  no  effort 
is  made  to  assess  the  “quality”  of  the  entire  curve.  A  practical  consequence  of  this  is  that  a 
curve  must  be  sampled  many  times  to  be  detected. 

In  our  approach  we  will  use  a  criterion  for  judging  the  “quality”  of  a  curve,  and  introduce 
a  technique  that  allows  detection  of  multiple  curves  simultaneously.  The  form  of  this  criterion 
is  suggested  by  considering  the  following  question:  “What  distribution  would  I  ideally  wish 
to  sample  from?”  If  we  can  define  an  ideal  sampling  distribution  and  obtain  a  sample  from 
5'(-),  then  we  can  obtain  a  sample  from  the  ideal  distribution  via  the  technique  of  importance 
sampling. 

In  the  following  sections  we  introduce  importance  sampling  and  then  show  how  it  relates 
to  the  HT.  We  present  some  simple  rules-of-thumb  for  determining  how  many  samples  from 
the  sampling  distribution  are  needed  and  suggest  using  clustering  techniques  to  simultane¬ 
ously  identify  curves.  Results  from  applying  these  ideas  to  both  simulated  and  real  data  are 
shown,  and  our  method  is  compared  to  two  standard  HTs. 


2  The  Importance  Sampling  Hough  Transform 

2.1  Importance  Sampling 

Importance  sampling  is  a  technique  that  can  be  useful  when  a  sample  from  a  particular 
target  distribution  is  desired  but  simulation  from  that  distribution  is  not  straightforward. 
We  will  restrict  our  attention  to  the  case  where  the  target  distribution  is  discrete  and  known 
only  up  to  a  multiplicative  scale  factor. 

Consider  a  discrete  random  variable,  0,  with  probability  mass  function  {7r(0)  :  0  G 
where  is  the  sample  space  of  9.  We  wish  to  obtain  a  sample  0i, . . . ,  from  7r(-).  We  know 
the  form  of  7r(-)  up  to  a  scale  factor,  i.e.: 

7r{9)  =  f{9)/c,  9  e  n, 

where  the  form  of  /(•)  is  known  but  the  normalizing  constant  c  is  unknown.  We  now  assume 


7 


that  there  exists  a  probability  mass  function  g{-)  (the  sampling  distribution  or  importance 
sampling  function)  dehned  on  Q  from  which  we  can  draw  a  sample,  9i, . . . ,  9t-  Each  obser¬ 
vation  in  this  sample  is  then  weighted  as  follows: 

These  weights  are  called  the  importance  weights.  If  a  random  (unweighted)  sample  from 
the  target  distribution  is  required  (e.g.  for  plotting  purposes),  then  sampling /importance 
resampling  (SIR)  (Rubin  1987)  can  be  used.  This  consists  of  simply  resampling  the  sampled 
parameters,  with  replacement,  with  probabilities  proportional  to  their  importance  weights. 

2.2  The  Algorithm 

We  now  outline  an  algorithm  for  a  new  PHT  that  can  be  viewed  in  an  importance  sampling 
framework.  We  call  it  the  Importance  Sampling  Hough  Transform  (ISHT).  The  algorithm 
to  detect  curves,  parameterized  by  0,  in  a  binary  edge  detected  image  proceeds  as  follows. 

1.  Create  the  set  E  of  all  edge  points  in  the  binary  edge  detected  image. 

2.  Dehne  a  target  distribution  f{9\E).  This  is  a  function  that  measures  the  “quality”  or 
“goodness-of-ht”  of  the  curve  given  by  9,  to  the  edge  points  E.  For  convenience  we 
shall  write  f{9). 

3.  Obtain  a  random  sample,  or  batch,  of  parameters,  {9i, . . .  ,9t-},  of  size  T*  from  a 
sampling  distribution  g{-). 

4.  For  each  sampled  parameter,  9i,  in  the  batch,  calculate  its  importance  weight  as: 

f{0i) 

Wi  =  - TfT, - . 

Ej=i  f{0j) 

(For  simplicity  we  have  assumed  that  g{-)  is  approximately  uniform  over  its  range  and 
thus  does  not  appear  in  the  above  equation.  See  Sections  2.3  for  discussion  of  the 
implications  of  this  assumption.) 

5.  Run  a  peak  detection  method  on  the  weighted  sample  of  parameters  to  identify  the 
curve  parameters.  Remove  the  points  corresponding  to  each  curve  identihed  from  E. 

6.  Check  whether  or  not  certain  stopping  conditions  have  been  satished.  If  not  return  to 
step  3. 

The  four  main  components  of  this  algorithm  are:  a  sampling  distribution,  a  target  distri¬ 
bution,  a  peak  detection  method,  and  suitable  stopping  conditions.  Each  of  these  components 
is  discussed  below  in  greater  detail. 
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2.3  Importance  Sampling  Distribution 

Once  a  sampling  mechanism  is  chosen  it  dehnes  the  importance  sampling  distribution,  g{-). 
As  with  all  PHTs,  the  sampling  mechanism  is  vital  to  an  efficient  HT.  If  an  image  is  moder¬ 
ately  complex  (e.g.  several  curves  present)  then  the  simplest  sampling  mechanism  (sampling 
p  points  at  random  from  the  entire  image)  will  rarely  sample  curve  parameters.  The  easiest 
modihcation  to  this  sampling  mechanism  is  to  use  the  point  distance  criterion  (see  Section 
1.3).  However,  even  though  this  sampling  mechanism  is  easy  to  implement,  exact  calculation 
of  g{-)  is  difficult.  This  is  also  true  of  more  complex  sampling  mechanisms.  In  these  cases  we 
make  the  simplifying  assumption  that  the  sampling  distribution  is  approximately  uniform 
and  therefore  cancels  from  the  calculation  of  the  importance  weights  (i.e  Wi  oc  This 

assumption  has  little  effect  on  the  performance  of  the  Hough  transform. 

2.4  Target  Distribution 

The  selection  of  a  good  target  distribution  is  crucial  to  the  success  of  the  HT.  A  good  target 
distribution  will  have  a  large  concentration  of  its  mass  on  the  parameters  corresponding  to 
curves  present  in  the  image.  The  target  distributions  we  consider  are  of  the  form: 


f{0i)  =  m)  X 


where  /3{9i)  is  a  measure  of  the  number  of  points  associated  with  the  curve  given  by  9i,  and 
a(9i)  is  a  measure  of  the  goodness  of  ht  of  these  points  to  the  curve. 

We  considered  several  functions  for  j3(-). 

1.  j3(9i)  =  rii  X  where  n*  is  the  number  of  points  that  lie  on  the  curve  given  by 

9i,  and  is  a  predehned  threshold.  If  «(•)  =  1,  then  this  can  be  thought  of  as  the 
target  distribution  corresponding  to  the  SHT. 

2.  j5{9i)  =  where  Vj  is  the  minimum  distance  from  the  point  to  the  curve 

given  by  and  W{rj)  is  a  weighting  function. 

This  weighting  function  can  be  interpreted  as  a  fuzzy  membership  function  with  mem- 
bership  radius  i?  if  IT(0)  =  1,  W{r)  decreases  monotonically  from  0  to  R,  and  IT(r)  =  0  for 
r  >  R  (Han,  Koczy,  and  Poston  1994).  Natural  choices  for  W{r)  are: 


1.  Step-function: 
W(r)  =  I  J- 


0  <  r  <  i? 
else 
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2.  Gaussian: 

r  0<r<R 

\  0,  else 


3.  Linear: 
W{r)  = 


I  l-r/R, 

I  0^ 


0  <  r  <  R 
else 


Note  that  for  values  of  R  between  |  and  if  one  uses  the  Step  weighting  function  then 
=  12^=1  ^(g)  ~  ^*5  the  number  of  points  that  lie  on  the  curve  . 

The  various  goodness-of-ht  criteria  we  considered  were: 


1.  a{9i)  =  {rii/ci)^,  where  n*  is  dehned  as  above  and  where  Cj  is  the  number  of  pixels  that 
lie  on  the  curve.  G  is  a  positive  number.  For  G  =  1,  this  is  the  proportion  of  pixels  on 
the  curve  that  are  points. 

2.  a{9i)  =  (pi/ci)^.  Each  point  within  distance  membership  radius  R  of  the  curve  is 
projected  onto  the  curve.  The  number  of  these  projected  points  is  pi  and  Cj  is  dehned 
as  above. 

3.  a{9i)  =  X  l{(pi/cOG>t<,},  for  some  threshold  ta- 

All  of  the  above  criteria  lie  between  0  and  1.  If  G  >  1  then  it  can  be  thought  of  as  a 
penalty  term,  penalizing  “bad”  curves.  However  if  G  <  1  then  the  goodness-of-ht  of  each 
curve  is  boosted.  The  hrst  goodness-of-ht  criterion  listed  is  very  strict.  It  favors  only  “perfect 
curves”  whereas  the  second  criterion  allows  variation  around  the  curve.  The  last  criterion 
gives  zero  weight  to  curves  that  do  not  exceed  a  certain  goodness-of-ht  threshold. 

Obviously  there  is  a  great  variety  of  target  distributions  that  can  be  used.  The  application 
will  play  a  large  role  in  determining  which  target  distributions  are  suitable. 


2.5  Peak  Detection  in  the  Parameter  Space  via  Clustering 

Given  a  sample  of  parameters  from  the  sampling  distribution,  and  their  importance  weights, 
a  peak  detection  method  must  be  used  to  determine  the  number  of  curves,  and  their  pa¬ 
rameters.  This  simultaneous  detection  of  peaks  in  the  parameter  space  can  be  thought  of  as 
a  clustering  problem.  We  wish  to  partition  the  sampled  parameters  into  spatially  compact 
groups,  where  each  group  represents  possible  parameters  for  a  particular  curve  in  the  image. 
The  center  of  each  group  will  be  our  estimate  of  the  parameters  for  that  curve.  Of  course 
some  sampled  parameters  do  not  correspond  to  any  curve  in  the  image.  These  parameters 
can  be  thought  of  as  noise.  The  number  of  these  sorts  of  parameters  in  the  sample  will 
depend  on  the  efhciency  of  the  sampling  distribution.  However,  these  parameters  should  all 
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have  low  importance  weights.  They  can  be  removed  by  thresholding  the  sample  based  on 
the  importance  weights.  Once  these  noise  parameters  are  removed,  the  remaining  sample 
of  parameters  can  be  clustered  using  any  one  of  numerous  clustering  methods.  These  range 
from  the  simple  A:-means  clustering  (Hartigan  1975)  (for  a  range  of  values  of  k)  to  more 
complex  hierarchical  and  model-based  methods,  such  as  MCLUST  (Banheld  and  Raftery 
1993;  Fraley  and  Raftery  1998). 

We  will  use  a  simple  method  similar  in  spirit  to  the  peak  detection  method  used  in  the 
RHT.  First,  we  threshold  the  sampled  parameters  based  on  their  importance  weights.  Each 
of  the  remaining  parameters  are  assumed  to  be  associated  with  a  curve  in  the  image.  We  then 
place  each  parameter  in  a  cell  of  an  accumulator  array  (as  in  the  RHT)  and  run  a  connected 
components  algorithm  on  the  non-empty  cells  in  the  array.  The  parameters  contained  in 
the  cells  associated  with  a  particular  connected  component  are  assumed  to  be  associated 
with  one  curve  in  the  image.  We  estimate  the  parameters  of  this  curve  by  taking  a  weighted 
average  of  all  the  parameters  associated  with  the  component  (where  the  weights  used  are  the 
importance  weights,  Wi).  Choosing  the  cell  size  appropriately  is  necessary  for  the  curves  to 
be  detected  accurately.  This  requirement  is  common  to  all  PHTs  proposed  to  date,  though. 

2.6  Stopping  Conditions 

One  advantage  of  our  approach  over  other  PHTs  is  that  the  we  do  not  have  to  sample  the 
points  that  lie  on  a  curve  many  times  in  order  to  detect  the  curve.  This  is  because  our 
algorithm  incorporates  a  measure  of  the  quality  of  each  curve.  As  a  result  of  this,  simple 
probabilistic  arguments  can  be  used  to  develop  stopping  conditions  for  the  HT. 

Our  notation  will  be  as  follows.  Let  h  be  the  number  of  batches  currently  processed, 
T  =  hT*  be  the  total  number  of  parameters  sampled,  and  ho  be  the  current  number  of 
consecutive  batches  processed  without  detecting  a  single  curve.  Let  the  current  number  of 
points  in  the  edge  set,  E,  be  W- 

We  propose  two  different  stopping  conditions  to  determine  if  a  sufhcient  number  of  pa¬ 
rameters  have  been  sampled  from  g{-).  One  condition  is  to  stop  the  HT  if  the  total  number 
of  parameters  sampled,  T,  exceeds  some  threshold,  Tmax-  An  alternative  rule  is  to  stop  the 
HT  if  the  number  of  consecutive  batches  processed  without  detecting  a  single  curve,  bo,  is 
greater  than  some  threshold,  Of  course  if  the  number  of  remaining  edge  points  becomes 

zero  (or  undesirably  low)  then  sampling  should  stop. 


Stopping  Condition  1:  Total  number  of  parameters  sampled 

The  total  number  of  parameters  sampled,  T,  will  be  sufhcient  if  the  probability  of  sampling 
all  curves  present  in  the  image  at  least  once  is  high.  We  can  write  down  this  probability  if 
we  have  some  idea  of  the  number  and  size  of  the  curves  in  the  image.  For  example  consider 
an  image  containing  M  p-dimensional  curves  of  n  points  each.  Let  P{n,N}  be  the  probability 
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of  sampling  p  points  all  from  a  n  point  curve,  given  that  the  image  contains  a  total  of  N 
points.  The  probability  of  having  sampled  all  curves  after  T  samples  is: 

Pr  =  l-  (  1  -  *  •  P{n,N}  f  . 

This  probability  can  be  plotted  as  a  function  of  T  and  can  then  be  used  to  gauge  what 
would  constitute  a  reasonable  sample.  Derivation  of  this  probability  can  be  found  in  Ap¬ 
pendix  4. 

If  the  sampling  mechanism  is  simply  sampling  p  points  at  random  then 

_  0 

P{n,N}  —  • 

If  one  uses  the  point  distance  criterion  then  this  probability  can  be  approximated  by 

_  n  (”_i) 

P{n,N}  ~  jY  ^  ' 

Here  n'  is  the  expected  number  of  points  that  would  typically  lie  on  a  given  curve  that 
would  satisfy  the  point  distance  criterion  given  that  the  hrst  point  selected  lies  on  that  given 
curve.  Similarly  N'  is  the  total  number  of  points  that  typically  satisfy  the  point  distance 
criterion  given  selection  of  the  hrst  point.  This  condition  will  result  in  a  conservative  stopping 
condition  (provided  the  guesses  for  M, n,  and  N  are  good).  This  is  because  the  above 
calculation  does  not  take  into  account  the  fact  that,  as  curves  are  detected  and  removed 
from  the  image,  the  probability  of  detecting  the  remaining  curves  increases. 


Stopping  Condition  2:  Number  of  batches  processed  since  last  detection 

Consider  the  number  of  consecutive  batches  processed  without  detecting  a  single  curve,  ho. 
If  this  number  is  not  zero,  then  we  should  cease  sampling  if  the  probability  of  not  having 
sampled  a  curve  over  these  hoT*  samples,  given  that  one  remains,  is  low.  Call  this  probability 
5.  This  can  be  calculated  as  follows: 


^  —  (1  P{n,Ni,})  ■ 

Sampling  should  stop  if  d  is  low  enough  (less  than  a  predehned  threshold  5o,  say),  or 
alternatively  if: 


rmax 

^0 


1  log  ^0 
T*  log(l  -  P{nM}) 


<  bo- 
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Figure  1:  Simulated  256  x  256  binary  image  containing  10  lines  and  1%  speckle  noise. 

This  stopping  condition  is  preferable  to  the  hrst  one,  since  it  uses  more  information  about 
the  current  state  of  the  HT  (e.g.  how  many  curves  have  been  detected,  which  edge  points 
remain)  and  should  result  in  a  smaller  sample  size  being  needed.  Discussion  of  the  selection 
of  the  batch  size  can  be  found  in  Section  4. 


3  Examples 

In  this  section  we  show  some  results  obtained  by  applying  our  methods  to  both  real  and 
simulated  data.  We  compare  our  method  (ISHT)  to  the  popular  randomized  Hough  trans¬ 
form  with  point  distance  criterion  (RHT_D),  and  to  the  original  standard  Hough  transform 
(SHT). 

3.1  Simulated  Data 

We  simulated  a  256  x  256  binary  image,  containing  10  lines  (see  Figure  1).  In  this  image 
we  randomly  changed  the  color  of  each  pixel  with  probability  0.01.  The  total  number  of 
points  in  the  image  is  2809,  while  the  total  number  of  points  associated  with  lines  is  2192. 
However,  the  probability  of  sampling  a  pair  of  points  at  random  both  from  the  same  line  is 
only  about  0.08. 
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3.1.1  Implementation 

For  the  ISHT,  we  chose  the  sampling  distribution  corresponding  to  the  sampling  mechanism 
of  the  RHT  with  point  distance  criterion  with  dmin  =  1,  and  d^ax  =  50,  i.e.  the  two  pixels 
selected  must  be  within  50  pixels  of  each  other.  The  components  of  our  target  distribution 
were  given  by: 

•  13 {9i)  =  where  W{-)  is  the  Gaussian  weighting  function  with  R  =  2,  and 

a  =  2. 

•  0:{9i)  iPi/c-i)  X  l{(pj/ci)>0.5}- 

This  target  distribution  allows  points  within  2  pixels  of  the  line  to  be  associated  with  the 
curve,  but  penalizes  curves  that  do  not  have  at  least  half  as  many  points  near  the  line  as 
pixels  on  the  line.  We  used  a  peak  detection  method  with  an  importance  weight  threshold 
of  zero.  The  cell  size  of  the  accumulator  array  (in  {p,  9}  notation)  was  1.8  x  The  batch 
size  was  50,  and  the  HT  was  stopped  if  two  consecutive  batches  failed  to  detect  any  curves. 

For  the  RHTJD,  we  set  the  threshold  to  be  10  points,  and  used  the  same  point  distance 
criterion  as  the  ISHT  above.  We  ceased  sampling  when  the  number  of  points  remaining  in 
the  image  was  less  than  30%  of  the  original  total.  The  cell  size  of  the  accumulator  array  was 
1.8  X  When  removing  detected  curves  from  the  image  we  removed  all  points  within  a  3 
pixel  radius  of  the  curves. 

For  the  SHT,  the  threshold  was  set  at  130  points.  The  cell  size  of  the  accumulator  array 
was  3.6  X  After  thresholding  the  accumulator  array  (all  cells  below  the  threshold  were 
set  to  zero),  we  ran  a  connected  components  algorithm  on  the  non-empty  cells.  The  cell 
with  the  most  votes  inside  each  connected  component  was  identihed  as  corresponding  to  a 
curve  present  in  the  image.  The  parameters  of  the  curve  were  given  by  the  parameters  of 
the  center  of  the  cell.  This  curve  identihcation  procedure  is  similar  to  our  ISHT  clustering 
algorithm. 

3.1.2  Results 

The  lines  detected  by  each  method  are  shown  in  Figure  2.  Each  method  detected  all  10  lines 
in  the  image  with  no  false  positives,  but  the  three  methods  differed  substantially  in  speed 
and  accuracy.  Table  1  shows  the  run  times  of  each  method,  and  the  mean  squared  errors  for 
p  and  9. 

In  terms  of  accuracy,  the  ISHT  was  clearly  the  best.  The  MSE  for  p  was  nearly  an  order 
of  magnitude  better  for  the  ISHT  than  for  the  RHT_D,  and  two  orders  of  magnitude  better 
than  for  the  SHT.  The  MSE  for  9  for  the  ISHT  was  4  times  lower  than  for  either  of  the  older 
methods.  The  SHT  was  the  fastest  method,  approximately  3  times  faster  than  the  ISHT, 
which  was  an  order  of  magnitude  faster  than  the  RHT_D.  This  is  not  too  surprising  given 
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Table  1:  Simulated  Data:  Run  times,  and  mean  squared  errors  for  each  HT. 

ISHT  RHT_D  SHT 
Run  Time  (seconds)  4.73  44.64  1.770 

MSEofyO  0.061  0.362  5.293 

MSEof0(xlO5)  9.02  40.91  45.65 


that  the  parameter  space  was  of  low  dimension  (high  dimension  adversely  affects  the  SHT 
more  than  the  RHT),  and  that  the  image  was  complex  (the  complexity  of  the  image  affects 
the  RHT  more  than  the  SHT). 


3.1.3  Stopping  Conditions 

For  this  example  we  used  the  stopping  condition  based  on  the  number  of  consecutive  empty 
batches.  Specihcally,  we  stopped  the  ISHT  when  we  had  two  empty  batches,  each  of  size 
50  parameters.  The  hnal  image,  after  the  detected  lines  had  been  removed,  contained  692 
points.  It  would  be  interesting  to  calculate  the  probability  of  not  detecting  a  line,  given 
that  one  existed,  in  these  remaining  100  (50  x  2)  samples.  Suppose,  for  simplicity,  that  our 
sampling  mechanism,  had  been  that  of  the  RHT,  and  the  undetected  line  contained  100 
points.  In  this  case,  the  probability  of  sampling  two  points  on  this  line  is: 


=  (100/692)2 
=  0.0209. 

Therefore  the  probability  of  not  detecting  the  curve  is: 


Pr  =  {1  -  P{n,N,}y°° 

=  (1  -  0.0209)^“ 

=  0.12. 

This  probability  is  low,  although  not  very  low.  It  suggests  that  maybe  we  should  run  our 
HT  longer.  The  alternative  stopping  condition,  based  on  the  total  number  of  parameters 
sampled,  is  more  conservative.  Figure  3  shows  the  probability  of  sampling  all  curves  in 
the  image  for  a  given  total  sample  size,  assuming  there  are  either  10  or  20  lines  in  the 
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Figure  3:  Simulated  Data:  Probability  of  sampling  all  lines  in  the  image  by  a  given  iteration, 
assuming  there  are  10  or  20  lines  in  the  image. 

image  (where  each  line  is  of  size  256  points).  The  probability  for  both  curves  is  high  when 
the  number  of  parameters  sampled  is  approximately  750.  In  this  example  we  sampled  250 
parameters  in  total,  and  we  can  see  that  the  probability  is  only  approximately  0.5.  Since 
we  had  detected  all  the  curves  in  the  image  after  250  samples,  it  appears  that  this  stopping 
condition  can  be  too  conservative;  however,  it  is  a  good  rule  of  thumb  for  establishing  an 
upper  bound  for  the  number  of  parameters  sampled. 


3.2  Blood  Cell  Data 

Figure  4  shows  a  265  x  272  image  of  blood  cells.  The  edge  detected  image,  obtained  via 
a  Sobel  transform,  is  shown  in  Figure  5.  The  task  of  identifying  each  blood  cell  is  not 
straightforward.  There  are  numerous  blood  cells  in  the  image.  All  are  roughly  circular, 
although  not  perfectly  so,  and  several  are  occluded,  either  by  other  blood  cells  or  by  the 
boundary  of  the  image. 


3.2.1  Implementation 

We  used  the  same  circle  parameterization  for  each  HT,  namely  0  =  {r,  a,  6},  where  r  is 
radius  of  the  circle,  and  {a,  b}  is  the  row  and  column  position  of  the  circle  center  (all  in 
pixels). 

For  the  ISHT,  we  chose  a  batch  size  of  200  samples.  The  sampling  mechanism  and 
target  distribution  are  the  same  as  in  the  previous  example.  Again  we  clustered  the  sampled 
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Figure  4:  Light  microscope  image  of  red  blood  cells.  Original  image  from  The  Image  Pro¬ 
cessing  Handbook^  (Russ  1999).  Used  with  permission. 
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parameters  after  each  batch  using  our  peak  detection  method.  The  cell  size  used  was  (in 
{r,  a,  6}  notation)  0.8  x  11  x  11  and  the  sample  was  thresholded  based  on  an  importance 
weight  of  45.  The  ISHT  was  stopped  if  4  consecutive  batches  detected  no  curves. 

For  the  RHT_D,  we  set  the  threshold  of  the  RHT_D  to  be  10  points,  and  used  the  same 
point  distance  criterion  as  the  ISHT  above.  The  size  of  a  cell  in  the  accumulator  array  was 
0.8  X  5  X  5.  As  in  the  previous  example,  we  stopped  sampling  when  the  number  of  points 
remaining  in  the  image  was  less  than  30%  of  the  original  total,  and  we  used  a  3  pixel  radius 
when  determining  which  points  should  be  removed  after  a  curve  had  been  detected. 

In  order  to  alleviate  the  severe  storage  requirements  of  the  SHT  once  the  number  of 
parameters  reaches  3,  as  here  with  circles,  we  employed  the  Gerig-Klein  modihcation  to  the 
HT  (Gerig  and  Klein  1986).  This  modihed  HT  allows  only  one  circle  to  be  detected  at  each 
given  center.  The  reduces  the  accumulator  array  from  a  three-dimensional  array  to  two 
two-dimensional  arrays.  The  cost  is  the  inability  to  detect  concentric  circles.  Of  course  in 
this  example  that  is  a  reasonable  assumption. 


3.2.2  Results 

The  hnal  classihcations  for  each  method  are  shown  in  Figure  6.  We  considered  26  of  the 
blood  cells  in  the  image  to  be  detectable.  The  number  of  cells  correctly  detected,  along  with 
the  run  times,  the  number  of  false  positives,  and  the  number  of  duplicates  are  shown  in 
Table  2. 


Table  2:  Blood  Cell  Analysis. 


ISHT 

RHT_D 

SHT 

Run  time 

44.90 

263.6 

138.9 

Cells  detected 

21 

23 

20 

Cells  undetected 

5 

3 

6 

False  positives 

0 

13 

0 

Duplicates 

0 

2 

5 

The  ISHT  detected  21  of  the  26  blood  cells,  detected  no  false  positives  and  no  duplicates, 
and  had  by  far  the  shortest  run  time.  The  RHT_D  detected  more  blood  cells  (23),  but  also 
detected  13  false  positives.  The  SHT  detected  fewer  blood  cells  (20),  and  also  detected  5 
duplicates.  We  can  see  by  visually  comparing  the  images  in  Figure  6  that  the  circles  detected 
by  the  ISHT  ht  the  blood  cells  better.  The  ISHT  did  not  detect  several  of  the  cells  that  are 
not  fully  within  the  image,  as  well  as  the  two  partially  occluded  cells  that  are  fully  contained 
within  the  image.  However,  with  suitable  modihcations  of  the  target  distribution  these  cells 
could  conceivably  be  detected. 
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(a)  ISHT  -  All  circles  corresponding  to  sam-  (b)  ISHT  -  Circles  detected  after  clustering. 

pled  parameters  with  positive  importance 

weight. 


(c)  RHTJD  -  Circles  detected.  (d)  SHT  -  Circles  detected. 

Figure  6:  Blood  Cells:  Circles  Detected. 
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4  Discussion 


We  have  found  importance  sampling  to  be  a  natural  framework  with  which  to  design,  view, 
and  understand  probabilistic  Hough  transforms.  We  feel  that  the  notion  of  a  target  dis¬ 
tribution,  which  defines  an  importance  weight  associated  with  every  sampled  parameter,  is 
fundamental  to  the  successful  implementation  of  a  PHT.  We  investigated  the  feasibility  of 
detecting  multiple  curves  from  a  given  batch  of  sampled  parameters.  We  found  that  sim¬ 
ple  clustering  methods  provided  good  identification  of  curve  parameters,  provided  that  the 
sampled  parameters  not  associated  with  curves  present  in  the  image  were  filtered  out.  The 
removal  of  these  “noise”  parameters  was  achieved  via  a  simple  thresholding  of  the  importance 
weights.  Probabilistic  arguments  can  be  used  to  determine  reasonable  stopping  conditions 
for  PHTs. 

As  with  most  data  processing  algorithms,  their  successful  implementation  often  depends 
on  correctly  setting  several  tuning  parameters.  We  feel  that,  depending  on  the  given  image, 
the  tuning  parameters  of  the  ISHT  are  reasonably  intuitive  to  set.  One  parameter,  the  batch 
size  (if  one  is  employing  the  second  stopping  condition)  can  affect  the  efficiency  and  accuracy 
of  the  ISHT.  If  the  batch  size  is  too  large,  the  increased  efficiency  of  sampling  from  a  smaller 
pool  of  points  is  lost.  Conversely,  if  the  batch  size  is  too  small,  curves  may  be  detected  after 
only  being  sampled  once,  thus  increasing  the  bias  in  the  parameter  estimates.  One  possible 
way  to  avoid  this  problem  is  to  use  a  small  batch  size,  and  employ  an  algorithm  similar 
to  the  Dynamic  RHT.  That  is,  for  each  detected  curve,  select  all  the  points  near  the  curve 
and  perform  a  refined  HT  on  these  points.  This  technique  can  be  thought  of  as  adaptively 
sampling  the  parameter  space,  and  it  is  of  course  not  the  only  conceivable  approach  to  take. 

The  importance  sampling  framework  is  broad  enough  to  incorporate  many  of  the  benefi¬ 
cial  features  of  other  existing  PHTs.  For  example,  we  used  a  very  simple  sampling  distribu¬ 
tion,  whereas  one  could  use  a  more  sophisticated  sampling  distribution,  e.g.  one  based  on  the 
Connective  RHT.  In  addition,  the  identification  of  curves  could  be  improved  by  incorporating 
an  adaptive  sampling  mechanism,  such  as  in  the  Dynamic  RHT.  The  main  computational 
burden  of  our  approach  over  other  PHTs  is  the  evaluation  of  the  target  distribution.  This 
should  be  an  0{N)  operation,  where  N  is  the  total  number  of  points  in  the  image.  We  feel 
that  the  benefit  of  evaluating  the  quality  of  a  curve  far  outweighs  the  cost  of  its  computation. 
In  summary,  we  feel  that  importance  sampling  provides  a  framework  in  which  different  PHTs 
can  be  understood,  and  which  will  be  useful  in  guiding  the  design  of  better  curve  detection 
methods. 
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Appendix  A:  Calculation  of  the  Probability  of  Detecting 
All  Curves 

Given  M  events,  Aq,  . . .  the  probability  of  the  union  of  all  of  these  events  can  be 

expressed  in  the  following  way: 


P{ufl,Ai)  =  P{A,)  +  ...  +  P{Am) 

i<j 

PiAnA^nAk)} 

i<j<k 


Now  consider  an  image  containing  M  p-dimensional  curves  of  n  points  each.  Assume  a 
sampling  distribution,  g{-)^  exists  that  enables  one  to  select  p  points  from  the  N  total  points 
in  the  image. 

Now  consider  a  sampling  scheme  where,  at  each  draw,  the  p  points  are  sampled  from 
the  N  total  points  using  the  sampling  distribution.  If,  at  a  particular  draw,  all  p  points  are 
sampled  from  the  same  curve,  we  shall  consider  that  curve  to  have  been  sampled.  Define 
P{n,N}  to  be  the  probability  of  sampling  a  curve  at  a  particular  draw  (we  assume  for  simplicity 
this  is  the  same  for  all  curves). 

Let  A\  be  the  event  that  the  curve  has  not  been  sampled  after  t  draws  of  p  points 
from  the  sampling  distribution.  Since  each  curve  has  n  points  and  each  sampled  parameter 
is  drawn  independently  from  g{-),  we  can  write: 

=  (1  -  P{n,N])\  for  i  =  1, . . . ,  M. 

Similarly  P{A\  fl  Ap  can  be  calculated  as  follows: 


P{A\  n  AJ) 


(P(A,^nA]))‘ 

(1  -  P{{A]  n  A]r)y 
(1  -  p{{Air  u  {A]r)Y 

(1  -  {p{{Air) + p{{A]r)  -  p{{Aiy  n  (A])^)))‘ 
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(1  “  P{n,N]  —  P{n,N]  +  0)* 

(1  -  2  AT})*,  for  alH  =  1, . . . ,  M,  j  7^  i. 


The  term  P{{AjY  fl  (AjY)  is  equal  to  zero  since  {Ajy  and  (AjY)  are  mutually  exclu¬ 
sive  (one  cannot  sample  two  curves  at  the  same  time).  Similarly,  the  probability  of  the 
intersection  of  all  events  can  be  written  as: 


Substituting  these  expressions  into  the  first  equation  we  find  that  the  probability  of  not 
sampling  all  curves  at  least  once  after  t  samples  have  been  drawn  from  g{-)  is: 

F(U",A‘)  =  1  -  ’■!’(".»)  )'• 

The  probability  of  having  sampled  all  curves  after  t  samples  is  simply  this  quantity 
subtracted  from  1. 
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